{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**复习：**在前面我们已经学习了Pandas基础，知道利用Pandas读取csv数据的增删查改，今天我们要学习的就是**探索性数据分析**，主要介绍如何利用Pandas进行排序、算术计算以及计算描述函数describe()的使用。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1 第一章：探索性数据分析"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 开始之前，导入numpy、pandas包和数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "#加载所需的库\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "#载入之前保存的train_chinese.csv数据，关于泰坦尼克号的任务，我们就使用这个数据\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 了解你的数据吗？\n",
    "教材《Python for Data Analysis》第五章"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.1 任务一：利用Pandas对示例数据进行排序，要求升序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 具体请看《利用Python进行数据分析》第五章 排序和排名 部分\n",
    "\n",
    "#自己构建一个都为数字的DataFrame数据\n",
    "\n",
    "'''\n",
    "我们举了一个例子\n",
    "pd.DataFrame() ：创建一个DataFrame对象 \n",
    "np.arange(8).reshape((2, 4)) : 生成一个二维数组（2*4）,第一列：0，1，2，3 第二列：4，5，6，7\n",
    "index=[2，1] ：DataFrame 对象的索引列\n",
    "columns=['d', 'a', 'b', 'c'] ：DataFrame 对象的索引行\n",
    "'''\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【代码解析】\n",
    "\n",
    "pd.DataFrame() ：创建一个DataFrame对象 \n",
    "\n",
    "np.arange(8).reshape((2, 4)) : 生成一个二维数组（2*4）,第一列：0，1，2，3 第二列：4，5，6，7\n",
    "\n",
    "index=['2, 1] ：DataFrame 对象的索引列\n",
    "\n",
    "columns=['d', 'a', 'b', 'c'] ：DataFrame 对象的索引行"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【问题】：大多数时候我们都是想根据列的值来排序,所以将你构建的DataFrame中的数据根据某一列，升序排列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#回答代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【思考】通过书本你能说出Pandas对DataFrame数据的其他排序方式吗？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【总结】下面将不同的排序方式做一个总结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1.让行索引升序排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2.让列索引升序排序\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3.让列索引降序排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4.让任选两列数据同时降序排序"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.2 任务二：对泰坦尼克号数据（trian.csv）按票价和年龄两列进行综合排序（降序排列），从这个数据中你可以分析出什么？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "'''\n",
    "在开始我们已经导入了train_chinese.csv数据，而且前面我们也学习了导入数据过程，根据上面学习，我们直接对目标列进行排序即可\n",
    "head(20) : 读取前20条数据\n",
    "\n",
    "'''\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【思考】排序后，如果我们仅仅关注年龄和票价两列。根据常识我知道发现票价越高的应该客舱越好，所以我们会明显看出，票价前20的乘客中存活的有14人，这是相当高的一个比例，那么我们后面是不是可以进一步分析一下票价和存活之间的关系，年龄和存活之间的关系呢？当你开始发现数据之间的关系了，数据分析就开始了。\n",
    "\n",
    "当然，这只是我的想法，你还可以有更多想法，欢迎写在你的学习笔记中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**多做几个数据的排序**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#写下你的思考\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.3 任务三：利用Pandas进行算术计算，计算两个DataFrame数据相加结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 具体请看《利用Python进行数据分析》第五章 算术运算与数据对齐 部分\n",
    "\n",
    "#自己构建两个都为数字的DataFrame数据\n",
    "\n",
    "\"\"\"\n",
    "我们举了一个例子：\n",
    "frame1_a = pd.DataFrame(np.arange(9.).reshape(3, 3),\n",
    "                     columns=['a', 'b', 'c'],\n",
    "                     index=['one', 'two', 'three'])\n",
    "frame1_b = pd.DataFrame(np.arange(12.).reshape(4, 3),\n",
    "                     columns=['a', 'e', 'c'],\n",
    "                     index=['first', 'one', 'two', 'second'])\n",
    "frame1_a\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": false
   },
   "source": [
    "将frame_a和frame_b进行相加\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【提醒】两个DataFrame相加后，会返回一个新的DataFrame，对应的行和列的值会相加，没有对应的会变成空值NaN。<br>\n",
    "当然，DataFrame还有很多算术运算，如减法，除法等，有兴趣的同学可以看《利用Python进行数据分析》第五章 算术运算与数据对齐 部分，多在网络上查找相关学习资料。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.4 任务四：通过泰坦尼克号数据如何计算出在船上最大的家族有多少人？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "'''\n",
    "还是用之前导入的chinese_train.csv如果我们想看看在船上，最大的家族有多少人（‘兄弟姐妹个数’+‘父母子女个数’），我们该怎么做呢？\n",
    "'''\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【提醒】我们只需找出”兄弟姐妹个数“和”父母子女个数“之和最大的数，当然你还可以想出很多方法和思考角度，欢迎你来说出你的看法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**多做几个数据的相加，看看你能分析出什么？**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "#写下你的其他分析\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.5 任务五：学会使用Pandas describe()函数查看数据基本统计信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "#(1) 关键知识点示例做一遍（简单数据）\n",
    "# 具体请看《利用Python进行数据分析》第五章 汇总和计算描述统计 部分\n",
    "\n",
    "#自己构建一个有数字有空值的DataFrame数据\n",
    "\n",
    "\n",
    "\"\"\"\n",
    "我们举了一个例子：\n",
    "frame2 = pd.DataFrame([[1.4, np.nan], \n",
    "                       [7.1, -4.5],\n",
    "                       [np.nan, np.nan], \n",
    "                       [0.75, -1.3]\n",
    "                      ], index=['a', 'b', 'c', 'd'], columns=['one', 'two'])\n",
    "frame2\n",
    "\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "调用 describe 函数，观察frame2的数据基本信息"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 1.6.6 任务六：分别看看泰坦尼克号数据集中 票价、父母子女 这列数据的基本统计数据，你能发现什么？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "'''\n",
    "看看泰坦尼克号数据集中 票价 这列数据的基本统计数据\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "#代码\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【思考】从上面数据我们可以看出，试试在下面写出你的看法。然后看看我们给出的答案。\n",
    "\n",
    "当然，答案只是我的想法，你还可以有更多想法，欢迎写在你的学习笔记中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**多做几个组数据的统计，看看你能分析出什么？**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 写下你的其他分析\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【思考】有更多想法，欢迎写在你的学习笔记中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "【总结】本节中我们通过Pandas的一些内置函数对数据进行了初步统计查看，这个过程最重要的不是大家得掌握这些函数，而是看懂从这些函数出来的数据，构建自己的数据分析思维，这也是第一章最重要的点，希望大家学完第一章能对数据有个基本认识，了解自己在做什么，为什么这么做，后面的章节我们将开始对数据进行清洗，进一步分析。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "433px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
